Claims 


We claim: 

1 \. >1 . A video conferencing system comprising: 

2V/ \^ an image pickup device for generating image signals representative of an image; 

3 V an audio pickup device for generating audio signals representative of sound from an audio 

4 source; and 

5 a muWiodal integration architecture system for processing said image signals and said 
,4* audio signals to determine a direction of the audio source relative to a reference point. 

10 \ 

IU \ 

IU 2. The video conferencing system of claim 1 wherein said multimodal integration 

|t23 \ 

-2^ architecture \ 

\ A \ 

A, system further comprises: \ 

y \ 

I4J an audio source localization system; 

13 \ 

\U a computer vision person detection system; and 

fd \ 

6 a multimodal speaker detection^ystem. 

1 3. The video conferencing system of claim 2, further comprising an integrated housing for an 

2 integrated video conferencing system incorporating the image pickup device, the audio pickup 

3 device, and the multimodal integration architecture system. 

1 4. The video conferencing system of claim 3, wherein the integrated housing is sized for 
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bei^portable. 

5. The yideo conferencing system of claim 2, further comprising an electronic pan tilt zoom 
system for electronically manipulating the image signals to effectively provide at least one of 
variable pan, tilt, and zoom functions. 

6. The video commencing system of claim 5, wherein the image pickup device is a stationary 
camera. 

7. The video conferencing system of claim 5, wherein the multimodal integrated architecture 
system provides control signaR to the electronic pan tilt zoom system. 

8. The video conferencing sysflem of claim 7, wherein the audio source moves relative to 
the reference point, the audio source localization system detects the movement of the audio 
source, and, in response to the movement the audio source localization system causes a change in 
the field of view of the image pickup 


9. The video conferencing system of claim 5\ wherein the audio pickup device is comprised of 
an array of two microphones. 
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1 10. \ A method comprising the steps of: 

2 \ generating, at an image pickup device, image signals representative of an image; 

3 generating, at an audio pickup device, audio signals representative of sound from an audio 

4 source; \ 

5 processing the image signals and the audio signals to determine a direction of the audio 

6 source relative to ^reference point; 

7 manipulating the image signals to produce refined image signals; and 

8 outputting said refined image signals. 

!□ \ 

] jj? 11. The method of claim TO further comprising the steps of: 

iu \ 

§ g applying said audio signals to an audio source localization system; 

iu \ 

Jfc3 applying said image signals to a computer vision person detection system; 

^ processing said audio signals and said image signals with a multimodal speaker detection 

% system; \ 

0 \ 

1 2s generating control signals based on the determined direction of the audio source; 

7 applying the control signals to an electronic pan tilt zoom system to mimic the effect of at 

8 least one function of a movable camera, said function selected from the group consisting panning, 

9 tilting, and zooming said movable camera; and \ 

10 providing an output from said electronic pan tiltyzoom system. 

1 12. The method of claim 10, further comprising electronically varying a field of view of the 

2 image pickup device in response to the control signals. \ 
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13. The method of claim 10, wherein processing the audio signals includes determining an 
audio based direction of the audio source based on the audio signals. 

14. The method of claim 12, wherein the audio source moves relative to a reference point, 
and wherein processing the audio signals further includes: 

detecting the movement of the audio source; and 

causing electronically, inyresponse to the movement, an increase in the field of view of the 
image pickup device. \ 

15. The method of claim 12, further comprising the step of supplying control signals, based on 
the audio based direction, for electronically panning, tilting, or zooming said image pickup 
device. \ 
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16. A video conferencing system comprising: 

two microphones for generating audio signals representative of sound from a speaker; 

a video camera for generating video signals representative of a video image; 

an electronic pan \ilt zoom system for manipulating video images to produce the visual 
effects of panning, tilting, and/or zooming; 

a processor for processing the video signals and the audio signals to determine a direction 
of a speaker relative to a reference point and supplying control signals to the electronic pan tilt 
zoom system for producing images that include the speaker in the field of view of the camera, the 
control signals being generated based on the determined direction of the speaker; and 

a transmitter for transmitting Vudio and video signals for video conferencing. 
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